Skip to content

Speed up crsql_changes merges ~8x#7

Merged
sinkingsugar merged 2 commits into
pure-c-portfrom
perf/fast-changes-merge
Jun 13, 2026
Merged

Speed up crsql_changes merges ~8x#7
sinkingsugar merged 2 commits into
pure-c-portfrom
perf/fast-changes-merge

Conversation

@sinkingsugar

Copy link
Copy Markdown
Member

Why

Importing CRDT diffs (INSERT INTO crsql_changes) was slow enough that apps resorted to hand-rolled batching hacks. Profiling showed that per merged change the extension paid a full sqlite3_prepare/finalize cycle, ~7 redundant statement executions, and — the dominant cost — three hot statements used RETURNING, which makes SQLite materialize an ephemeral btree with its own pager + page cache on every executed row (visible as massive sys-time / VM churn). The clock-table RETURNING key literally echoed back a value bound as parameter 1.

What

  • Drop RETURNING from all hot-path statements — winner clock (use the bound key), pk-lookaside inserts (__crsql_key is a rowid alias → sqlite3_last_insert_rowid), site-id ordinal insert, and the as_crr backfill path
  • No more per-row temp preparecrsql_get_or_create_key_packed() binds unpacked PK ColumnValues directly into the cached key statements
  • Sync bit via direct int* (ExtData.syncBitPtr) instead of stepping SELECT crsql_internal_sync_bit(x) twice per change
  • Last-row memo (table, pk) → (lookaside key, causal length): changesets arrive ordered by (db_version, seq), so the N column changes of one row hit it back-to-back. Invalidated on commit/rollback hooks, savepoint rollback (new vtab xRollback/xRollbackTo, module iVersion 2), local-write trigger entry, table-info reloads, compact_post_alter, and any merge error (statement-journal undo)
  • Site-id → ordinal memo (changesets are virtually always single-site)
  • crsql_next_db_version computed in C and bound as a value; skips the PRAGMA data_version probe when pendingDbVersion is already established in the open transaction

Results (Release, Apple Silicon, single transaction)

Scenario Before After
Fresh import, 80k changes / 20k rows 0.943s 0.111s 8.5×
Idempotent re-import (all changes lose) 0.179s 0.063s 2.8×
400k changes / 100k rows ~4.7s 0.64s (~625k changes/s) ~7×

Remaining profile is genuine btree work (column upsert + clock insert).

Testing

  • C unit suite passes; adds testRowMemoSavepointRollback covering memo invalidation across ROLLBACK TO (written with NDEBUG-immune checks, since assert is compiled out in Release)
  • All 155 Python correctness tests pass
  • Clean under Guard Malloc (make asan currently hangs in the ASAN runtime's own init on recent Xcode — pre-existing, unrelated)
  • Two-peer convergence verified end-to-end: concurrent conflicting inserts/updates/deletes across two tables, full + delta (db_version > ?) exchanges → byte-identical peers
  • core/test/perf/bench-import.sh added for repeatable measurement

Per merged change, the insert path paid a full prepare/finalize cycle,
several redundant statement executions and - worst of all - three hot
statements used RETURNING, which makes SQLite materialize an ephemeral
btree (with its own pager and page cache) on every executed row.

- Drop RETURNING from winner clock, pk lookaside, site_id ordinal and
  backfill inserts; read values from bound params or last_insert_rowid
- Bind unpacked pks directly to the key lookaside statements instead of
  round-tripping through a temporary 'SELECT ?,?,...' prepare per row
- Toggle the sync bit through a direct pointer instead of executing
  SELECT crsql_internal_sync_bit(x) statements per change
- Memoize (table, pk) -> (lookaside key, causal length) of the last
  merged row; changesets are ordered by (db_version, seq) so the N
  column changes of a row hit the memo back-to-back. Invalidated on
  commit/rollback, savepoint rollback (new vtab xRollback/xRollbackTo,
  module iVersion 2), local CRR writes, table info reloads, compaction
  and on any merge error (statement journal undo)
- Memoize the last site_id -> ordinal resolution (changesets are
  virtually always single-site)
- Compute next_db_version in C, bound as a value, and skip the
  PRAGMA data_version probe when pendingDbVersion is already set for
  the open transaction

Import of 80k changes (20k rows): 0.94s -> 0.11s. Idempotent re-import:
0.18s -> 0.06s. Adds a savepoint-rollback regression test (with
NDEBUG-immune asserts) and test/perf/bench-import.sh.
A site's col_version is monotonic per cell, so an incoming change whose
col_version equals the local clock entry AND whose site_id matches the
entry's author is the identical change: reject it without reading the
local value from the data table. The site ordinal rides along in the
col_version select (same clock row, no extra cost) and the incoming
site's ordinal resolves through the existing site_id memo.

True concurrent edits (equal versions from different sites) still fall
through to the deterministic value comparison, verified by an explicit
two-peer convergence check and the python correctness suite.

Also folds the duplicated ordinal lookup in set_winner_clock into the
shared lookup_site_ordinal helper.

Idempotent re-import of 80k changes: 0.063s -> 0.035s (0.179s before
this PR).
@sinkingsugar sinkingsugar merged commit d0540ac into pure-c-port Jun 13, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant